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Summary 

MPEG has been outstandingly successful in defining standards for video compression 
coding, serving a wide range of applications, bit-rates, qualities and services on a 
worldwide basis. The standards are based upon a flexible toolkit of techniques for bit-rate 
reduction. MPEG video coding uses a combination of motion-compensated interframe 
prediction {for reducing temporal redundancy) with Discrete Cosine Transform (DCT) and 
variable length coding tools (for reducing spatial redundancy). The specification only 
defines the bitstream syntax and decoding process: the coding process is not specified and 
the performance of a coder will vary depending upon, for example, the quality of the 
motion-vector measurement, and the processes used for prediction-mode selection. 
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MPEG VIDEO CODING: A basic tutorial introduction 

S.R. Ely, Ph.D. C.Eng., M.I.E.E. 



1. INTRODUCTION 

MPEG (Moving Pictures Expert Group) started in 
1988 as a Working Group of tiie International Stand- 
ards Organisation (ISO)^ witii the aim of defining 
standards for digital compression of video and audio 
signals. It took as its basis the ITU-T standard for 
video-conferencing and video-telephony* with that of 
JPEG (Joint Photographic Experts Group) which was 
initially developed for compressing still images such 
as electronic photography. 

The first goal of MPEG was to define a video coding 
algorithm for digital storage media; in particular, for 
the CD-ROM. The resulting standard was published in 
1993.** It comprises three parts, covering: 

• systems aspects (including multiplexing and syn- 
chronisation)^' 

• video coding 

• audio coding. 3 

It has been applied in the Interactive CD (CDi) system 
to provide full motion video playback from CD, and is 
widely used in PC applications, for which a range of 
hardware and software coders and decoders are avail- 
able. This standard is known as MPEG-1 and is 
restricted to non-interlaced video formats; it is primar- 
ily intended to support video coding at bit-rates up to 
about 1.5 Mbit/s. 

In 1990, MPEG began work on a second standard, to 
be capable of coding interlaced pictures directly; origi- 
nally to support high-quality applications at bit-rates in 
the range of about 5 to 10 Mbit/s. MPEG-2,'' as it is 
now known, also supports high definition formats at 
bit-rates in the range of about 15 to 30 Mbit/s. As for 
MPEG-1, the MPEG-2 standard (published in 
1994***) is comprised of three parts: systems, video 
and audio- 
It is important to note that the MPEG standards specify 
only the syntax and semantics of the bit-streams and 
the decoding process; they do not specify the encoding 
process. Much of the latter is left to the discretion of 
the coder designers and this gives scope for improve- 
ment as coding techniques are refined and new 
techniques developed. 

* This is now l^nown as Worl^ing Group H261 . 
** As ISO/IEC 11172. 
***AslSO/IEC 13818. 



2. VIDEO CODING PRINCIPLES 

A studio-quality 625-line component picture, when 
digitised according to ITU Recommendation 601/656 
(i.e. 4:2:2 sampling), requires 216 Mbit/s to convey the 
luminance and two chrominance sample components 
(see Fig. 1). For bandwidth-restricted media (such as 
terrestrial or satellite channels), some way of reducing 
the very high bit-rate needed to represent the digitised 
picture must be obtained. 

A video bit-rate reduction system^ (for producing com- 
pression) operates by removing redundant and less 
important information from the signal prior to trans- 
mission, and then reconstructing an approximation of 
the image from the remaining (compressed) informa- 
tion at the decoder. In video signals, three distinct 
kinds of redundancy can be identified: 

• Spatial and temporal redundancy: pixel val- 
ues are not independent but are correlated with 
their neighbours, both within the same frame and 
across frames. So, to some extent, the value of a 
pixel is predictable, given the values of neigh- 
bouring pixels. 

• Entropy redundancy: for any non-random dig- 
itised signal, some code values occur more 
frequently than others. This can be exploited by 
coding the more frequently-occurring values 
with shorter codes than would be used for the 
rarer values. This same principle has long been 
exploited in the Morse Code, where the com- 
monest letters in English ('E' and 'T') are 
represented by one dot and one dash respectively 
whereas the rarest ('X', 'Y' and 'Z') are repre- 
sented by four dots and dashes. 
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• Psycho visual Redundancy: this form of redun- 
dancy results from the way the eye and brain 
work. In audio, the limited frequency response of 
the ear is well appreciated. In video, both the 
limit to the fine detail which the eye can resolve 
(limits of spatial resolution) and the limits in 
ability to track fast-moving images (limits of 
temporal resolution), must be considered. The 
latter means, for example, that a shot-change 
masks fine detail on either side of the change. 



3. MPEG VIDEO COMPRESSION TOOLKIT 

Sample-rate reduction is a very effective method of re- 
ducing bit-rate, but of course introduces irreversible 
loss of resolution. For very low bit-rate applications 
(e.g. in MPEG-1), alternate fields are discai'ded and the 
horizontal sampling -rate reduced to around 360 pixels 
per line (giving about 3.3 MHz resolution). The sample 
rate for the chrominance is half that of the luminance, 
both horizontally and vertically. In this way, the bit- 
rate can be reduced to less than one fifth of that of a 
conventional definition (4:2:2-sampled) signal. 

For 'broadcast quality', at bit-rates in the range 3 to 
10 Mbit/s, horizontal sample-rate reduction is not ad- 
visable for the luminance or chrominance signals, nor 
is temporal sub-sampling. However, for distribution 
and broadcast applications, sufficient chi'ominance 
resolution can be provided with the vertical chromi- 
nance sampling frequency halved. Thus, for most 
MPEG-2 coding applications, 4:2:0 sampling is likely 
to be used, rather than 4:2:2, although the latter, and 
4:4:4 sampling, is also supported. It may be of interest 
to note that a conventional delay-line PAL decoder 
effectively yields the same vertical sub-sampling of the 
chrominance signals as 4:2:0 sampling. 

Apart from sample-rate reduction, the MPEG toolkit 
includes two different kinds of tools to exploit redun- 
dancy in images: 

• Discrete Cosine Transform (DCT)^' ^ is similar 
to the Discrete Fourier Transform (DFT). The 
purpose of using this orthogonal transform is to 
assist the processing to remove spatial redun- 
dancy by concentrating the signal energy into 
relatively few coefficients. 

• Motion-Compensated interframe prediction is 

used to remove temporal redundancy. This is 
based on techniques similar to the well known 
differential pulse-code modulation (DPCM) 
principle. 



3.1 Discrete cosine transform 

Consider the luminance signal of a 4:2:0-sampled dig- 



itised 625 -line picture comprising about 704 pixels 
horizontally and about 576 lines vertically (see Fig. 2). 
In MPEG coding, spatial redundancy is removed by 
processing the digitised signals in two-dimensional 
blocks of 8 pixels by 8 lines (taken from either one 
field or two, depending on the mode of operation). 

As Fig. 3 illustrates, the DCT transform is a reversible 
process which maps between the normal 2D presenta- 
tion of the image and one which represents the same 
information in what may be thought of as the 'fre- 
quency' domain. Each coefficient in the 8x8 DCT 
domain block indicates the contribution of a different 
DCT 'basis' function (top-left in Fig. 3); it is called the 
DC coefficient, and may be thought of as representing 
the average brightness of the block. Moving down the 
block in Fig. 3, the coefficients represent increasing 
vertical frequencies; and moving along the block, from 
left to right, represents increasing horizontal frequencies. 

The DCT does not directly reduce the number of bits 
required to represent the block. In fact for an 8 x 8 im- 
age block of 8-bit pixels, the DCT produces an 8 x 8 
block of at least 11 -bit DCT coefficients to allow for 
reversibility ! The reduction in the number of bits fol- 
lows from the fact that for typical blocks of natural 
images, the distribution of coefficients is non-uniform 
- the transform tends to concentrate the energy into the 
low-frequency coefficients, and many of the other co- 
efficients are near zero. The bit-rate reduction is 
achieved by not transmitting the near-zero coefficients, 
and by quantising and coding the remaining co-effi- 
cients as described below. The non-uniform coefficient 
distribution is a result of the spatial redundancy present 
in the original image block. 

Many different forms of transformation have been in- 
vestigated for bit-rate reduction. The best transforms 
are those which tend to concentrate the energy of a pic- 
ture block into a few coefficients. The DCT is one of 
the best transforms in this respect and has the advan- 
tage that the DCT and its inverse are easy to implement 
in digital processing. The choice of 8 x 8 block size is 
a trade-off between the need to use a large picture area 



88 blocks 




Fig. 2 - Block-based DCT. 
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Fig. 3 - DCT transform pairs. 



for the transform, so that the energy compaction 
described above is most efficient; and also, the fact 
that the content and movement of the picture varies 
spatially, which would tend to argue for a smaller 
block-size. A large block size would also emphasise 
variations from block-to-block in the decoded picture; 
it would also emphasise the effects of 'windowing' by 
the block structure. 

3.2 Coefficient quantisation 

After a block has been transformed, the transform co- 
efficients are quantised. Different quantisation is 
applied to each coefficient depending on the spatial 
frequency within the block that it represents. The ob- 
jective is to minimise the number of bits which must 
be transmitted to the decoder so that it can perform the 
inverse transform and reconstruct the image: reduced 
quantisation accuracy reduces the number of bits 
which need to be transmitted to represent a given DCT 
coefficient, but increases the possible quantisation 
error for that coefficient. Note that the quantisation 
noise introduced by the coder is not reversible in the 
decoder, so the coding and decoding process is 'lossy'. 

More quantisation error can be tolerated in the high- 
frequency coefficients because high-frequency noise is 
less visible than low-frequency quantisation noise. 
Also, quantisation noise is less visible in the chromi- 
nance components than in the luminance component. 
MPEG uses weighting matrices to define the relative 
accuracy of the quantisation of the different coeffi- 
cients. Different weighting matrices can be used for 
different frames depending on the prediction mode 
used. 

The weighted coefficients are then passed through a 
fixed quantisation law which is usually a linear law. 
However, for some prediction modes there is an in- 
creased threshold level (i.e. a dead-zone) around zero. 
The effect of this threshold is to maximise the number 
of coefficients which are quantised to zero. In practice, 
it is found that small deviations around zero are usu- 
ally caused by noise in the signal; so that suppressing 



these values actually gives an apparent improvement to 
the subjective picture quality. 

Quantisation noise is more visible in some blocks than 
in others - for example, in blocks which contain a 
high-contrast edge between two plain areas. In such 
blocks, the quantisation parameters can be modified to 
limit the maximum quantisation error, particularly in 
the high-frequency coefficients. 



3.3 Zig-zag coefficient scanning, run-length 
coding, and variable length coding 

After quantisation, the 8x8 blocks of DCT coeffi- 
cients are scanned in a zig-zag pattern (see Fig. 4 
(overleaf) to turn the 2D array into a serial string of 
quantised coefficients. Two scan patterns are defined: 
one is usually preferable for picture material which has 
strong vertical frequency components, due to, perhaps, 
the interlace picture structure. In this scan pattern there 
is a bias to scan vertical coefficients first. In the other, 
which is preferable for pictures without a strong verti- 
cal structure, there is no bias and the scan proceeds 
diagonally from top left to bottom right as illustrated in 
Fig. 4. The coder signals its choice of scan pattern to 
the decoder. 

The strings of coefficients produced by the zig-zag 
scanning are coded by counting the number of zero co- 
efficients preceding a non-zero coefficient; that is, they 
are run-length coded. The run-length value and the 
value of the non-zero coefficient which the run of zero 
coefficients precedes are then combined and coded using 
a variable length code (VLC). This VLC coding exploits 
the fact that short runs of zeros are more likely than 
long ones and small coefficients are more likely than large 
ones. The VLC allocates codes which have different 
lengths depending upon the expected frequency of 
occurrence of each zero-run-length/non-zero coeffi- 
cient value combination. Common combinations use 
short code words, less common combinations long 
code words. All other combinations are coded by the 
combination of an escape code and two fixed length 
codes, one 6-bit word to indicate the run length and 
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Fig. 4 - Scanning of DCT blocks and run-length coding with variable length codes (Entropy coding). 



one 12-bit word to indicate the coefficient value. 

One VLC code table is used in most circumstances. 
However, a second VLC code table is used for some 
special pictures. The DC coefficient is treated differ- 
ently in some modes. But, all the VLCs are designed 
such that no complete codeword is the prefix of any 
other codeword - they are similar to the well-known 
Huffman code. Thus, the decoder can identify where 
one variable length codeword ends and another starts 
when operating within the correct codebook. No VLC, 
or combination of codes, is allowed to produce a 
sequence of 23 contiguous zeros - this combination is 
used for synchronisation purposes. 

DC coefficients within blocks in intra macro-blocks 
(see below) are differentially encoded before variable 
length coding. 

3.4 Buffering and feedback 

The DCT coefficient quantisation, run-length, and 
VLC coding processes produce a varying bit-rate 
which depends upon the complexity of the picture in- 
formation and the amount and type of motion in the 



picture. To produce the constant bit-rate needed for 
transmission over a fixed bit-rate system, a buffer is 
needed to smooth out the variations in bit-rate. For pre- 
venting overflow or underflow of this buffer, its 
occupancy is monitored and feedback applied to cod- 
ing processes to control the input to the buffer. The 
DCT quantisation process is often used to provide 
direct control of the buffer's input. As the buffer be- 
comes full, the quantiser is made coarser to reduce the 
number of bits used to code each DCT coefficient; and 
as the buffer empties, the DCT quantisation is made 
finer. Other means of controlling the buffer occupancy 
may be used as well as, or instead of, control of the 
DCT coefficient quantisation. 

Fig. 5, shows a block diagram of a basic DCT codec, 
with, in this example, the buffer occupancy controlled 
by feedback to the DCT coefficient quantisation. 

It is important to note that the final bit-rate at the out- 
put of an MPEG video encoder can h& freely varied. If 
the output bit-rate is reduced, the buffer will empty 
more slowly and the coder will automatically compen- 
sate by, for example, making the DCT coefficient 
quantisation coai'ser. Clearly, reducing the output bit-rate 
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Fig. 5 - Basic DCT coder. 
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reduces the quality of the decoded pictures. Hence, to 
squeeze more TV channels into an r.f. channel (e.g. by 
transmitting VHS -quality instead of standard broad- 
cast quality), the output bit-rate of the MPEG video 
encoder can be readily reduced to meet this possible 
requirement. Conversely, the need for an HDTV chan- 
nel would demand a much higher bit-rate, and hence, 
geater r.f. channel space. There is, therefore, no need to 
lock input sampling rates to channel bit-rates or 
vice-versa. 

3.5 Reduction of temporal redundancy: 
interframe prediction 

In order to exploit the fact that pictures often change 
little from one frame to the next, MPEG includes 
temporal prediction modes; that is, there is an effort to 
predict one frame for coding from a previous 
'reference' frame. 
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Fig. 7 - DCT with interframe prediction coder. 

'predictor' would just comprise a delay of exactly one 
frame as shown in Fig. 7. This makes a good prediction 
for stationary regions of the image but is poor in 
moving areas. 



Fig. 6 illustrates a basic Differential Pulse Code Modu- 
lation (DPCM) coder, in which only the differences 
between the input and a prediction based on previous, 
locally-decoded output are quantised and transmitted. 
Note that the prediction cannot be based on previous 
source pictures because the prediction has to be repeat- 
able in the decoder where the source pictures are not 
available. Consequently, the coder contains a local de- 
coder which reconstructs pictures exactly as they 
would be in the destination decoder. The locally- 
decoded output then forms the input to the predictor. In 
interframe prediction, samples from one frame are 
used in the prediction of samples in other 'reference' 
frames. 

In MPEG coding, interframe prediction (which re- 
duces temporal redundancy) is combined with the 
DCT and the variable length coding tools described 
above (which reduce spatial redundancy) - see Fig. 7. 
The coder subtracts the prediction from the input to 
form a 'prediction-error' picture. The prediction error 
is transformed with the DCT, the coefficients quan- 
tised, and these quantised values coded using a VLC. 

The simplest interframe prediction is for anticipating a 
block of samples from the same spatially-positioned 
(co-sited) block in the reference frame. In this case, the 
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Fig. 6 - Basic DPCM coder. 



3.6 Motion-compensated interframe 
prediction 

A more sophisticated prediction method, known as 
motion-compensated interframe prediction, is to offset 
any translational motion which has occurred between 
the block being coded and the reference frame; and to 
use a shifted block from the reference frame as the pre- 
diction (see Fig. 8 {overleaf}). 

One method of determining the motion that has oc- 
curred, between the block being coded and the 
reference frame, is a 'block-matching' search in which 
a large number of trial offsets are tested in the coder 
(see Fig. 9 overleaf)). The 'best' offset is selected on 
the basis of a measurement of the minimum error be- 
tween the block being coded and the prediction. Since 
MPEG defines only the decoding process, not the en- 
coding, the choice of motion measurement algorithm is 
left to the designer of the coder; it is an area where 
considerable differences in performance occur between 
different algorithms and different implementations. A 
major requirement is to have a search area large 
enough to cover any motion that is present from frame 
to frame. However, increasing the size of the search 
area greatly increases the processing needed to find the 
best match - various techniques, such as 'hierarchical 
block matching', are used to try to overcome this 
dilemma. 

Bi-directional prediction (see Fig. 10 overleaf) con- 
sists of forming a prediction from both the previous 
frame and the following frame by a linear combination 
of these, shifted according to suitable motion 
estimates. 

Bi-directional prediction is particularly useful where 
motion uncovers areas of detail; although, to enable 
backward prediction from a future frame, the coder re- 
orders the pictures so they are transmitted in a different 
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Fig. 8 - Motion-compensated interframe prediction. 
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Fig. 9 - Principle of block-matching motion. 



order from the displayed order. This process, and the 
reordering to the correct display order in the decoder, 
introduces considerable end-to-end processing delay 
which may be a problem in some applications. To 
overcome this, MPEG defines a profile (see below) 
which does not use bi-directional prediction. 

Whereas the basic coding unit for spatial redundancy 
reduction in MPEG is based on an 8 x 8 block, motion- 
compensation in MPEG is usually based on a 16 pixel 
by 1 6 line macroblock. The size of the macroblock is 
a trade-off between the need to minimise the bit-rate 
needed to transmit the motion representation (known 
as 'motion vectors') to the decoder, which argues for a 
large macroblock size, and the need to vary the prediction 
process locally within the picture content and move- 
ment, which lays a claim for a small macroblock size. 

To minimise the bit-rate needed to transmit the motion 
vectors, they are differentially encoded with reference 
to previous motion vectors. The motion vector value 
'prediction error' is then variable-length coded using 
another VLC table. 

Fig. 11 shows a conceptual motion-compensated inter- 
frame DCT coder in which, for simplicity, the 
implementation of the process of motion-compensated 
prediction is illustrated by suggesting the use of a vari- 
able delay. In practical implementations, of course, the 
motion-compensated prediction is implemented in 
other ways. 

3.7 Prediction modes 

In an MPEG-2 coder, the motion compensated predic- 
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Fig. 10 - Motion-compensated bi-directional prediction. 
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tor supports many methods for generating a prediction. 
For example, a macroblock may be 'forward pre- 
dicted' from a past picture, 'backward predicted' from 
a future picture (P coded), or 'interpolated' by averag- 
ing a forward and backward prediction (B coded). 
Another option is to make a zero value prediction, 
such, that the source image block, rather than the pre- 
diction error-block, is DCT-coded. Such macroblocks 
are known as 'intra' or I coded. Intra macroblocks can 
carry motion vector information, although no predic- 
tion information is needed. The motion vector 
information for an I-macroblock is not used, in normal 
circumstances, but its function is to provide a means of 
concealing decoding errors when data errors in the bit- 
stream make it impossible to decode the data for that 
macroblock. 

Field or frame prediction coding may be used. Fields 
of a frame may be predicted separately from their own 
motion vector (field prediction coding), or together 
using a common motion vector (frame prediction cod- 
ing). Generally, for image sequences where the motion 
is slow, frame prediction coding is more efficient. 
However, when motion speed increases, field predic- 
tion coding becomes more efficient. 

In addition to the two basic modes of field and frame 
prediction, two further modes have been defined: 

16 X 8 motion compensation uses at least two motion 
vectors for each macroblock: one vector is used for the 
upper 16x8 region and one for the lower half. (In the 
case of B-pictures (see below) a total of four motion 
vectors are used for each macroblock in this mode, 
since both the upper and lower regions may each have 
motion vectors referring to past and future pictures.) 
This mode is permitted only in field- structured pic- 
tures and, in such cases, is intended to allow the spatial 
area that is covered by each motion vector to be ap- 
proximately equal to that of a 16 x 16 macroblock in a 
frame-structured picture. 



Dual prime mode may be used in both field- and 
frame-structured coding but is only permitted in P- 
pictures (see below) when there have been no 
B-pictures between the P-picture and its reference 
frame. In this case, a motion vector and a differential 
offset motion vector are transmitted. For field pictures, 
two motion vectors are derived from this data and are 
used to form two predictions from two reference fields. 
These two predictions are combined to form the final 
prediction. For frame pictures, this process is repeated 
for each of the two fields; each field is predicted sepa- 
rately, giving rise to a total of four field predictions 
which are combined to form the final two predictions. 
Dual prime mode is used as an alternative to bi-direc- 
tional prediction, where low delay is required; it avoids 
the frame re-ordering needed for bi-directional predic- 
tion but achieves similar coding efficiency. 

For each macroblock to be coded, the coder chooses 
between these prediction modes, trying to minimise the 
distortions on the decoded picture within the con- 
straints of the available channel bit-rate. The choice of 
prediction mode is transmitted to the decoder, together 
with the prediction error, so that it can regenerate the 
correct prediction. 

Fig. 12 (overleaf) illustrates how a bi-directionally 
coded macroblock (a 'B' macroblock) is decoded. The 
switches illustrate the various prediction modes avail- 
able for such a macroblock. Note that the coder has the 
option not to code some macroblocks; no DCT coeffi- 
cient information is transmitted for those blocks, and 
the macroblock address counter-skips to the next coded 
macroblock. The decoder output for the uncoded mac- 
roblocks simply comprises the predictor output. 



3.8 Picture types 

In MPEG-2, three 'picture types' are defined (see 
Fig. 13 (overleaf). The picture type defines which pre- 
diction modes may be used to code each macroblock. 

3.8.1 Intra pictures 

Intra pictures (I-pictures) are coded without reference 
to other pictures. Moderate compression is achieved by 
reducing spatial redundancy but not temporal redun- 
dancy. They ai'e important as they provide access 
points in the bit-stream where decoding can begin 
without reference to previous pictures. 

3.8.2 Predictive pictures 

Predictive pictures (P-pictures) are coded using mo- 
tion-compensated prediction from a past I- or P-picture 
and may be used as a reference for further prediction. 
By reducing spatial and temporal redundancy, P-pictures 
offer increased compression compared to I-pictures. 
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Fig. 12 - Decoding a 'B' macroblock. 



no 
prediction 



3.8.3 Bi-directionally-predictive pictures 

Bi-directional-predictive pictures (B-pictures) use 
both past and future I- or P-pictures for motion com- 
pensation, and offer the highest degree of compression. 
As noted above, to enable backward prediction from a 
future frame, the coder re-orders the pictures from 
natural display order to 'transmission' (or 'bitstream') 
order so that the B -picture is transmitted after the past 
and future pictures which it references. (See Fig. 14). 
This introduces a delay which depends upon the 
number of consecutive B-pictures. 

3.8.4 Group of pictures 

The different picture types typically occur in a repeat- 
ing sequence termed a Group of Pictures or GOP. A 
typical GOP is illustrated in display order in Fig. 14(a), 
and in transmission order in Fig. 14(b). 

A regular GOP structure can be described with two 
parameters: N, which is the number of pictures in 
the GOP; and M, which is the spacing of P-pic- 
tures. The GOP illustrated in Fig. 14 is described as 
N=9andM=3. 



For a given decoded picture quality, coding using each 
picture type produces a different number of bits. In a 
typical sequence, a coded I-picture needs three times 
more bits than a coded P-picture, which itself occupies 
50% more bits than a coded B-picture. 



4. MPEG PROFILES AND LEVELS 

MPEG-2 is intended to be generic, supporting a di- 
verse range of applications. Different algorithmic 
elements or 'tools', developed for many applications, 
have been integrated into a single bit-stream syntax.^' ^ 

Implementing the full syntax in all decoders is unnec- 
essarily complex, so a small number of subsets, or 
profiles of the full syntax, have been defined. Also, 
within a given profile, a level is defined which speci- 
fies a set of constraints on parameters within the 
profile, such as maximum sampling density. The pro- 
files defined to date fit together in such a way that a 
higher profile is a superset of a lower one. A decoder 
which supports a particular profile and level is only re- 



• Intra-coded (I) picture - coded 
using information only from itself 

• Predictive-coded (P) pictures are coded 
with reference to a previous I or P picture 
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• Bidirectionally predictive (B) pictures are coded 
with reference to both the previous I or P picture 
and the next (future) I or P picture 

Fig. 13 - MPEG picture types. 
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Fig. 14 - Example of group of pictures (GOP). 



quired to support the corresponding subset of the full 
syntax and a set of parameter constraints. To restrict 
the number of options which must be supported, only 
selected combinations of profile and level ai'e defined 
as conformance points (see Table 1). There follows, a 
listing of profiles:- 

• Simple profile uses no B-frames, and hence no 
backward or interpolated prediction. Conse- 



quently, no picture re-ordering is required, which 
makes this profile suitable for low-delay applica- 
tions such as video conferencing. 

Main profile adds support for B-pictures, which 
improves the picture quality for a given bit-rate 
but increases delay. Currently, most MPEG-2 
video decoder chip-sets support main profile. 







Table 1: MPEG profiles and levels. 










Profile and maximum total bit-rate (Mbit/s) 








Maximum density H/V/F 
samples 


Simple 


Main 


SNR 
Scalable 


Spatially 
Scalable 


High 




High 
(1920/1152/60) 


— 


MP@HL 
80 Mbit/s 


— 


— 


HP@HL 

100 Mbit/s 
+ lower layers 




High-1140 
(1440/1152/60) 


— 


MP@H-14 
60 Mbit/s 


— 


Spt@H-14 

60 Mbit/s 

+ lower layers 


HP@H-14 

80 Mbit/s 

+ lower layers 


0) 

> 
-1 


Main 
(720/576/30) 


SP@ML 

15 Mbit/s 


MP@ML 

15 Mbit/s 


SNR @ ML 

15 Mbit/s 

+ lower layers 


— 


HP@ML 

20 Mbit/s 
+ lower layers 




Low 
(352/280/30) 




MP@LL 

4 Mbit/s 


SNR@LL 

4 Mbit/s 






ISO 11172 




(MPEG-1) 
1 .856 Mbit/s 













Notes: All decoders shall be able to decode ISO/IEC 11172 bitstreams. 
SML decoders are required to decode MP@LL bitstreams. 
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• SNR profile adds support for enhancement layers 
of DCT coefficient refinement using Signal-to- 
Noise Ratio (SNR) scalability. 

• Spatial profile adds support for enhancement 
layers canying the image at different resolutions 
using the spatial scalability tool.^ 

• High profile adds support for 4:2:2-sampIed 
video. 

All MPEG-2 decoders will also decode MPEG-1 pic- 
tures (but not vice-versa). 



5. CONCLUSIONS 

MPEG has been outstandingly successful in defining 
standards for video compression coding by serving a 
wide range of applications, bit-rates, qualities and 
services. The standards ai'e based upon a flexible 
toolkit of techniques of bit-rate reduction. The specifi- 
cation only defines the bitstream syntax and decoding 
process; the coding process is not specified and the 
performance of a coder will vary depending upon, for 
example, the quality of the motion- vector measurement, 
and the processes used for prediction mode selection. 

The picture quality through an MPEG codec depends 
strongly upon the picture content; but as experience 
with MPEG coding grows, the bit-rate needed for a 
given picture quality is hkely to reduce. 
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